Exploration of Drivers and Model Forms

Author
Affiliation

Gulf of Maine Research Institute

Published

March 21, 2024

Sea Surface Temperature and Size Change Expectations

Code
library(gt)
library(tidyverse)
library(gmRi)
library(tidyverse)
library(ggeffects)
library(zoo)
library(lme4)
library(mgcv)
library(patchwork)
library(emmeans)
library(rstatix)
library(ggpubr)


theme_set(theme_gmri() + theme(plot.margin = margin()) )

#--------------------
## Set up
#--------------------

# Load the data 
df <- read_csv(here::here("data/size_and_spectra_model_data.csv"))

# Get raw landings
landings_raw <- read_csv(here::here("data/unscaled_spectra_predictor_dataset_wide.csv")) %>% 
  select(year, survey_area, landings_raw = landings)

# Get the raw sst from oisst
sst_raw = read_csv(here::here("data/unscaled_regional_sst.csv"))

# Put it all together and clarify the column names
df <- df %>% 
  select(-sst) %>% 
  left_join(landings_raw) %>% 
  left_join(sst_raw) %>% 
  mutate(
    survey_area = factor(
      survey_area, 
      levels = c("Gulf of Maine", "Georges Bank", 
                 "Southern New England", "Mid-Atlantic Bight")),
    region = str_replace_all(survey_area, "-| ", "_"),
    region = factor(
      region, 
      levels = c("Gulf_of_Maine", "Georges_Bank", 
                 "Mid_Atlantic_Bight", "Southern_New_England")))

The following tabs detail how I’m thinking about whether to use SST in celsius or Celsius as anomalies to test our hypotheses about temperature’s impact on size and size structure:

Raw SST should (and does) increase with lower latitudes in the Northern hemisphere.

The temperature size rule does not predict that the mean/median size of a community should be correlated to the ambient temperature. Though this may be the case, this is not what it contends with directly, or that is not my understanding of it.

My thinking is that each regional community is likely composed of individuals that on-average are living near temperatures that are neither significantly “colder” or “warmer” than their preference. If there are mean/median differences in size, then I don’t think it is a manifestation of the TSR.

Code
p1 <- ggplot(df) +
  geom_point(aes(sst, med_len, color = survey_area)) +
  geom_smooth(method = "lm", aes(sst, med_len)) +
  labs(x = "SST", y = "Median Length", title = "Median length is Negatively\nCorrelated With SST")

p2 <- ggplot(df) +
  geom_point(aes(sst, med_wt, color = survey_area)) +
  geom_smooth(method = "lm", aes(sst, med_wt)) +
  labs(x = "SST", y = "Median Weight", title = "Median Weight is as Well")

p3 <- ggplot(df) +
  geom_point(aes(sst, b, color = survey_area)) +
  geom_smooth(method = "lm", aes(sst, b)) +
  labs(x = "SST", y = "Size Spectra Slope", title = "Size Spectra Slope Also Appears to Be...\nBut it is unclear why it would.") 


p1 + p2 + p3 + guide_area() + plot_layout(ncol = 2, guides = "collect")

Anomalies are based off a 30-year climatology for each respective region, and should capture departures from the long-term average. This helps out with a problem of modeling SST x Region interactions, where the distributions of sst for the regions do not overlap well.

The TSR contends that growth rates for species reared under higher temperatures show a plastic response where growth rates at early ages are elevated, and maturation happens earlier and at smaller sizes.

The temperature-size rule (TSR) is a widespread phenomenon, which describes the phenotypic plastic response of species’ size to temperature: individuals reared at colder temperatures mature as larger adults than at warmer temperatures.

This effect IMO is better seen through anomalies, and I also would not expect sst anomalies to operate differently between regions since they are region-specific.

Code
p1 <- ggplot(df) +
  geom_boxplot(aes(x = survey_area, y = sst, color = survey_area)) +
  labs(x = "Area", y = "OISST", title = "There are Different Baseline SSTs\nAmong Regions")

p2 <- ggplot(df, aes(x = survey_area, y = sst_anom, color = survey_area)) + 
  geom_boxplot() +
  labs(title = "Anomalies (or scaled values)\nFrame in terms of regional average", x = "Area", y = "OISST Anomaly")


p3 <- ggplot(drop_na(df, sst), aes(year, sst_anom, color = survey_area)) +
  geom_point(alpha = 0.4) +
  geom_smooth(se = F, linewidth = 1) +
  labs(title = "All regions have elevated\nSST conditions in last decade", x = "Year", y = "OISST Anomalies")


p4 <- ggplot(drop_na(df, ersst), aes(year, ersst_anom, color = survey_area)) +
  geom_point(alpha = 0.4) +
  geom_smooth(se = F, linewidth = 1) +
  labs(title = "Departure from long-term trend\nstarts in 2000's", x = "Year", y = "ERSST Anomalies")


(p1 + p3) + plot_layout(guides = "collect")

Code
(p2 + p4) + plot_layout(guides = "collect")

These next plots explore the degree that SST anomalies are correlated with median size and spectra slope, and thoughts on what that relationship could/should be operating.

How do anomalies look with respect to these body-size metrics?

In a bit of a surprise to me, there does not seem to be any clear signal that any of these body-size related indicators are correlated with SSt anomalies.

Code
p1 <- ggplot(df, aes(sst_anom, med_len)) +
  geom_point(aes(color = survey_area)) +
  geom_smooth(method = "lm", color = "black") +
  #geom_smooth(method = "lm", aes(color = survey_area), se = F) +
  labs(x = "SST Anomaly", y = "Median Length")

p2 <-  ggplot(df, aes(sst_anom, med_wt)) +
  geom_point(aes(color = survey_area)) +
  geom_smooth(method = "lm", color = "black") +
  #geom_smooth(method = "lm", aes(color = survey_area), se = F) +
  labs(x = "SST Anomaly", y = "Median Weight")

p3 <- ggplot(df, aes(sst_anom, b)) +
  geom_point(aes(color = survey_area)) +
  geom_smooth(method = "lm", color = "black") +
  #geom_smooth(method = "lm", aes(color = survey_area), se = F) +
  labs(x = "SST Anomaly", y = "Size Spectra Slope") 


p1 + p2 + p3 + guide_area() + plot_layout(ncol = 2, guides = "collect") + plot_annotation(title = "SST anomalies show no global correlation, could be an interaction situation, could be nothing...")

Its possible that maybe the regions would have some differential response. However, I wouldn’t anticipate them to behave differently under expectations from the TSR.

It also does not appear that there is much evidence for that either…

Code
p1 <- ggplot(df, aes(sst_anom, med_len, color = survey_area)) +
  geom_point() +
  geom_smooth(method = "lm") +
  labs(x = "SST Anomaly", y = "Median Length")

p2 <-  ggplot(df, aes(sst_anom, med_wt, color = survey_area)) +
  geom_point() +
  geom_smooth(method = "lm") +
  labs(x = "SST Anomaly", y = "Median Weight")

p3 <- ggplot(df, aes(sst_anom, b, color = survey_area)) +
  geom_point() +
  geom_smooth(method = "lm") +
  labs(x = "SST Anomaly", y = "Size Spectra Slope") 


p1 + p2 + p3 + guide_area() + plot_layout(ncol = 2, guides = "collect") + plot_annotation(title = "SST Anomaly x Region Interaction")

I would almost expect SST anomalies to have a non-linear response shape of some dome-like shape mirroring a thermal preference curve.

This visual check isn’t very convincing of that though.

Code
ggplot(df, aes(sst_anom, b, color = survey_area)) +
  geom_point(aes()) +
  geom_smooth(methog = "gam") +
  facet_wrap(~survey_area, ncol = 2)

Should we Drop SST/SST Anomalies?

So we know that using SST itself has some problems:
1. SST from different regions have poor overlap
2. SST is colinear with the passage of time and with the declines in landings

From Bart:

Therefore, I think the best approach is to drop SST and focus on year as the predictor (and a proxy for SST).

My Thoughts:
Why would we want a proxy for SST when we have SST? Wouldn’t it be more prudent to make a statement on a lack of SST’s association with median size and spectra slope?

Including region and year brings in additional complications of collinearity. Each of the other variables shows trends through time. While it would be possible to follow a differencing procedure (e.g. remove the temporal trend) in these variables, I think a simpler and must easier approach is to just include year as both a predictor and a random effect, and remove all other covariates. What we are doing here is assuming that there is lots of stuff that differs with year, but we are going to push all that variation into the random effect component of the model in order to focus on the temporal trends by region.

My Thoughts:
I can see some of the rationale of going this route for the convenience of navigating the multicollineary of time+sst+landings.

To me, this option does not test either hypotheses for either landings or SST. I believe we’d have a decent model in terms of unbiased predictions and decent performance, but it seems like kind of a hands in the air for our original hypotheses and just focusing on whether there were changes over time.

Exploring years as a Random Effect

I’m also confused whether you can/should use year both as a fixed and a random. And also whether you are using it as a continuous variable for fixed and intended to use it as a categorical variable in the random effects term. I don’t think random effects can be continuous variables.

The model formula you had in your summary resembles what I would anticipate for having “year” as a continuous fixed effect.

Code
# Model summaries look the same if you enforce year as a factor for random effect or not
# ggpredict plots do not.
mod_regionXyear <- lmer(b ~ year*survey_area + (1|year), data = df)

cont <- emmeans::emtrends(mod_regionXyear, pairwise ~ survey_area, var = "year")

p1 <- as.data.frame(cont$emtrends) %>%
  ggplot(aes(y = fct_rev(survey_area), x = year.trend))+
  geom_pointrange(aes(xmin = lower.CL, xmax = upper.CL))+
  geom_vline(xintercept = 0, linetype = 4)+
  labs(x = "Coefficient value", y = "", title = "Mixed-Effects Model:\nRegion Fixed Effects")+
  theme_classic()

p2 <- plot(ggpredict(mod_regionXyear, ~year+survey_area), add.data = T) + 
  ggtitle("Model Formula:\nyear * region + (1 | year)")

p1 / p2

This is what I imagine a random effect for year would produce.

Code
# Enforcing year as a factor
mod_regionXyear <- lmer(b ~ year*survey_area + (1|factor(year)), data = df)

cont <- emmeans::emtrends(mod_regionXyear, pairwise ~ survey_area, var = "year")

p1 <- as.data.frame(cont$emtrends) %>%
  ggplot(aes(y = fct_rev(survey_area), x = year.trend))+
  geom_pointrange(aes(xmin = lower.CL, xmax = upper.CL))+
  geom_vline(xintercept = 0, linetype = 4)+
  labs(x = "Coefficient value", y = "", 
       title = "Mixed-Effects Model:\nRegion Fixed Effects")+
  theme_classic()

p2 <- plot(ggpredict(mod_regionXyear, ~year+survey_area), add.data = T) + 
  ggtitle("Model Formula:\nyear * region + (1 | factor(year))")

p1 / p2

Or use much simpler model?

If we care about the change over time differences I think we’d want year as a fixed effect. And indeed the coefficent estimates are the same.

Code
simple_mod <- lm(b ~ year*survey_area, data = df)
simple_cont <- emmeans::emtrends(simple_mod, pairwise ~ survey_area, var = "year")

p1 <- as.data.frame(simple_cont$emtrends) %>%
  ggplot(aes(y = fct_rev(survey_area), x = year.trend))+
  geom_pointrange(aes(xmin = lower.CL, xmax = upper.CL))+
  geom_vline(xintercept = 0, linetype = 4)+
  labs(x = "Coefficient value", y = "", title = "Fixed-Effects Only Model:\n Region Intercepts")+
  theme_classic()

p2 <- plot(ggpredict(simple_mod, ~year+survey_area), add.data = T) + ggtitle("Model Formula:\nyear * region")

p1 / p2

Proposing Alternative Random Effects Structure

I think there is some sense to have year as a R.E. It can help soak up some variance for things that happened to the four regions on any year that we don’t have measurements for (stratification dynamics, weather patterns, GSI, etc)

But I don’t know whether it should also be a fixed effect in the same model.

I don’t really care about whether regions have changed over time, our hypotheses are both focused on whether change can be associated with warming or fishing removals. So what if we swapped in sst?

Code
# New model:
mod_regionXsst_anom <- lmer(
  b ~ sst_anom * survey_area + (1|year), 
  data = df)

# Get contrasts
cont <- emmeans::emtrends(mod_regionXsst_anom, pairwise ~ survey_area, var = "sst_anom")


p1 <- as.data.frame(cont$emtrends) %>%
  ggplot(aes(y = fct_rev(survey_area), x = sst_anom.trend))+
  geom_pointrange(aes(xmin = lower.CL, xmax = upper.CL))+
  geom_vline(xintercept = 0, linetype = 4)+
  labs(
    x = "Coefficient value", 
    y = "", 
    title = "Mixed-Effects Model Formula::\nb ~ sst_anom * region + (1 | year)\nRegion Intercepts")+
  theme_classic()


# Predictions
p2 <- plot(ggpredict(mod_regionXsst_anom, ~year+survey_area), add.data = T) + ggtitle("Year x Slope No Relationship")

p3 <- plot(ggpredict(mod_regionXsst_anom, ~sst_anom+survey_area), add.data = T) + ggtitle("sst_anom x Region Interaction Seems Like Something Now")


p1 / p2 / p3


What about landings?

Landings are the other covariate that we have a defined hypothesis for.

Landings are also collinear with year.

Landings also are highest for Georges Bank and Gulf of Maine before SST is available. So there is a data availability challenge for trying to work with the two covariates in addition to multicollinearity….

As far as a picking an informed model structure: - Similar value overlap issue that raw SST had, where the landings values span different ranges
- I would want to have a regional interaction, because the species targeted and the fishing means will be different, likely leading to different impacts

So if I were just doing landings I would probably start here:

Code
p1 <- ggplot(df,  aes(year, landings,  color = survey_area)) +
  geom_point() +
  geom_smooth(method = "gam") +
  labs(title = "Relative Regional Landings by Area")


p2 <- ggplot(df,  aes(landings, med_len, color = survey_area)) +
  geom_point() +
  geom_smooth(method = "gam") +
  labs(title = "median length ~ s(landings_scaled)")

p3 <- ggplot(df,  aes(landings, b, color = survey_area)) +
  geom_point() +
  geom_smooth(method = "gam") +
  labs(title = "b ~ s(landings_scaled)")

p1 + p2 + p3 + guide_area() + plot_layout(ncol = 2, guides = "collect")

This suggests to me that:

harvest is not a strong selective force for smaller body sizes from 1970-2019 in the Northwestern Atlantic. With some potential interaction of importance in the Gulf of Maine.

Which could support the notion that:

Reductions in harvest are indeed weakening harvest induced declines in body size.

However, the anticipated potential increases in body size due to lower harvest are now masked by increases in temperature.

The data offers evidence that declines in body size in the fish community in the Northwest Atlantic are due to relative increases in regional sst since 1982.


Testing What we Care about:

Here is how I think I would proceed:

What do We Know:

Changes in median body length and size spectrum slope have occurred within different regions of the Northeast shelf.

These regions have experienced increases in average temperature.

Ecological literature would suggest that elevated temperatures could impact the size structure of the community via mechanisms related to TSR theory.

The Northeast US is also home to a number of large-scale commercial Fisheries. Fisheries also have been shown to impact the size structure of communities through removal of larger individuals and selective pressures.

For 3/4 of our regions these two covariates are colinear. Making it difficult to use either in a model together. They are also both colinear with year further complicating things.

Code
library(ggpmisc)

p1 <- ggplot(df, aes(sst_anom, landings)) +
  geom_point() + 
  geom_smooth(method = "lm") +
  stat_poly_eq() +
  scale_y_continuous(expand = expansion(add = c(0,2))) +
  facet_wrap(~survey_area)

p2 <- ggplot(df, aes(year, sst_anom)) +
  geom_point() + 
  geom_smooth(method = "lm") +
  stat_poly_eq() +
  scale_y_continuous(expand = expansion(add = c(0,2))) +
  facet_wrap(~survey_area)

p3 <- ggplot(df, aes(year, landings)) +
  geom_point() + 
  geom_smooth(method = "lm") +
  stat_poly_eq() +
  scale_y_continuous(expand = expansion(add = c(0,2))) +
  facet_wrap(~survey_area)

p1 + p2 + p3 + plot_layout(ncol = 2) + plot_annotation(title = "Three-way colinearity")

Should we even use RE?

I think there is a good case to make for regions to be treated as random effects. With any of the following reasons contributing to samples within the regions not being statistically independent, among other possibilities:
- differential species composition
- different mixing/productivity regimes
- different fisheries

I just know less what this means in terms of any our hypotheses.

Code
length_noyr_mm <- lmer(med_len ~ (1 | survey_area) * landings + sst_anom, data = df )

Moving Forward: Hypothesis Based Model Structures

From a hypothesis standpoint I would anticipate the following: 1. SSt anomalies should have a global non-linear impact effect that is stable across regions (no interaction with region)
- Landings could very likely differ between region due to different fisheries, and different community compositions (interaction with landing)
- Regions have different community structure, bathymetry, current dynamics, and connectedness to different sized nearshore habitats. I would expect each to vary (intercept)

Modeling without Year

From the above hypotheses, this is the linear model structure I would use. Year is not included as a term in these models.

Median Length Model:

Code
# Linear model with our primary hypotheses
length_noyr <- lm(med_len ~ region * landings + sst_anom, data = df )

summary(length_noyr)

Call:
lm(formula = med_len ~ region * landings + sst_anom, data = df)

Residuals:
     Min       1Q   Median       3Q      Max 
-14.8119  -2.4864  -0.3935   2.4744  12.3483 

Coefficients:
                                     Estimate Std. Error t value Pr(>|t|)    
(Intercept)                          26.46099    0.72904  36.296   <2e-16 ***
regionGeorges_Bank                   -0.09738    1.02173  -0.095   0.9242    
regionMid_Atlantic_Bight            -10.48518    1.01415 -10.339   <2e-16 ***
regionSouthern_New_England           -9.92448    1.00205  -9.904   <2e-16 ***
landings                              1.56107    0.80875   1.930   0.0556 .  
sst_anom                             -0.11020    0.61238  -0.180   0.8574    
regionGeorges_Bank:landings          -0.96418    1.03600  -0.931   0.3536    
regionMid_Atlantic_Bight:landings    -2.16327    1.04527  -2.070   0.0403 *  
regionSouthern_New_England:landings  -1.71460    0.99528  -1.723   0.0871 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 4.118 on 142 degrees of freedom
  (49 observations deleted due to missingness)
Multiple R-squared:  0.6101,    Adjusted R-squared:  0.5881 
F-statistic: 27.77 on 8 and 142 DF,  p-value: < 2.2e-16
Code
# Get contrasts
cont <- emmeans::emtrends(length_noyr, pairwise ~ region, var = "landings")

# Plot contrasts
p1 <- as.data.frame(cont$emtrends) %>%
  ggplot(aes(y = fct_rev(region), x = landings.trend))+
  geom_pointrange(aes(xmin = lower.CL, xmax = upper.CL))+
  geom_vline(xintercept = 0, linetype = 4)+
  labs(
    x = "Coefficient value", 
    y = "", 
    title = "Fixed-Effects lm Formula::\nmed_length ~ sst_anom + landings * region")+
  theme_classic()


# Predictions
p2 <- plot(ggpredict(length_noyr, ~sst_anom), add.data = T) + labs(y = "Median Length", x = "SST Anomaly")

p3 <- plot(ggpredict(length_noyr, ~landings*region), add.data = T)  + labs(y = "Median Length", x = "Standardized Regional landings")


p1 / p2 / p3

Code
# Augment with Model fit and residuals:
df_aug <- augment(length_noyr, drop_na(df, sst_anom, landings)) %>% 
  mutate(resid_col = ifelse(.resid > 0, "darkred", "royalblue"))


# Temporal Patterns in Residuals
ggplot(df_aug) +
  geom_hline(yintercept = 0) +
  geom_line(aes(year, med_len, color = "Observed")) +
  geom_line(aes(year, .fitted, color = "Model Fit")) +
  facet_wrap(~region) +
  labs(y = "Residuals", title = "Model fits")

Code
# Temporal Patterns in Residuals
ggplot(df_aug) +
  geom_hline(yintercept = 0) +
  geom_line(aes(year, .resid), color = "gray") +
  geom_point(aes(year, .resid, color = I(resid_col))) +
  facet_wrap(~region) +
  labs(y = "Residuals", title = "Temporal Patterns in Residuals")

Spectra Slope Model:

Code
# Linear model with our primary hypotheses
b_noyr <- lm(b ~ region * landings + sst_anom, data = df )

summary(b_noyr)

Call:
lm(formula = b ~ region * landings + sst_anom, data = df)

Residuals:
      Min        1Q    Median        3Q       Max 
-0.160563 -0.036144 -0.007886  0.033354  0.139481 

Coefficients:
                                     Estimate Std. Error t value Pr(>|t|)    
(Intercept)                         -0.922175   0.009809 -94.013  < 2e-16 ***
regionGeorges_Bank                  -0.012645   0.013747  -0.920   0.3592    
regionMid_Atlantic_Bight            -0.112674   0.013645  -8.258 9.35e-14 ***
regionSouthern_New_England          -0.117426   0.013482  -8.710 7.09e-15 ***
landings                             0.017087   0.010881   1.570   0.1186    
sst_anom                            -0.003155   0.008239  -0.383   0.7024    
regionGeorges_Bank:landings         -0.008689   0.013939  -0.623   0.5340    
regionMid_Atlantic_Bight:landings   -0.034807   0.014064  -2.475   0.0145 *  
regionSouthern_New_England:landings -0.007334   0.013391  -0.548   0.5848    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.05541 on 142 degrees of freedom
  (49 observations deleted due to missingness)
Multiple R-squared:  0.5207,    Adjusted R-squared:  0.4937 
F-statistic: 19.28 on 8 and 142 DF,  p-value: < 2.2e-16
Code
# Get contrasts
cont <- emmeans::emtrends(b_noyr, pairwise ~ region, var = "landings")

# Plot contrasts
p1 <- as.data.frame(cont$emtrends) %>%
  ggplot(aes(y = fct_rev(region), x = landings.trend))+
  geom_pointrange(aes(xmin = lower.CL, xmax = upper.CL))+
  geom_vline(xintercept = 0, linetype = 4)+
  labs(
    x = "Coefficient value", 
    y = "", 
    title = "Fixed-Effects lm Formula::\nb ~ sst_anom + landings * region")+
  theme_classic()


# Predictions
p2 <- plot(ggpredict(b_noyr, ~sst_anom), add.data = T) + labs(y = "Median Length", x = "SST Anomaly")

p3 <- plot(ggpredict(b_noyr, ~landings*region), add.data = T)  + labs(y = "Median Length", x = "Standardized Regional landings")


p1 / p2 / p3

Code
# Augment with Model fit and residuals:
df_aug <- augment(b_noyr, drop_na(df, sst_anom, landings)) %>% 
  mutate(resid_col = ifelse(.resid > 0, "darkred", "royalblue"))


# Temporal Patterns in Residuals
ggplot(df_aug) +
  geom_hline(yintercept = 0) +
  geom_line(aes(year, b, color = "Observed")) +
  geom_line(aes(year, .fitted, color = "Model Fit")) +
  facet_wrap(~region) +
  labs(y = "Residuals", title = "Model fits")

Code
# Temporal Patterns in Residuals
ggplot(df_aug) +
  geom_hline(yintercept = 0) +
  geom_line(aes(year, .resid), color = "gray") +
  geom_point(aes(year, .resid, color = I(resid_col))) +
  facet_wrap(~region) +
  labs(y = "Residuals", title = "Temporal Patterns in Residuals")


Testing Change Over Time: Have Regions Seen Changes in Slope/Size

This would be preliminary/supplemental exploration of whether median size or size spectra slopes have changed within regions.

These would be straightforward linear models with a region interaction. They provide validity to statements of change over time and differences between regions, but do not speak to hypotheses on why those changes have occurred.

Code
# Need a slope/intercept  interaction between year and region
length_change_lm <- lm(med_len ~ year * region, data = df)

# # Test homegeneity of slopes - for ANCOVA
# df %>% anova_test(med_len ~ region * year)

# They aren't homogenous - pairwise test for differences
slopes_lst <- emtrends(length_change_lm, ~ region, var = "year")
slopes_lst %>% as.data.frame() %>%  gt()          # slope estimates and CIs
region year.trend SE df lower.CL upper.CL
Gulf_of_Maine -0.14492197 0.04084622 192 -0.2254869051 -0.064357032
Georges_Bank -0.07183673 0.04084622 192 -0.1524016710 0.008728202
Mid_Atlantic_Bight 0.07984508 0.04084622 192 -0.0007198603 0.160410012
Southern_New_England -0.03015606 0.04084622 192 -0.1107209987 0.050408874
Code
pairs(slopes_lst) %>% as.data.frame() %>% gt()   # comparisons
contrast estimate SE df t.ratio p.value
Gulf_of_Maine - Georges_Bank -0.07308523 0.05776527 192 -1.2652105 0.5861883421
Gulf_of_Maine - Mid_Atlantic_Bight -0.22476704 0.05776527 192 -3.8910409 0.0007878996
Gulf_of_Maine - Southern_New_England -0.11476591 0.05776527 192 -1.9867629 0.1965640418
Georges_Bank - Mid_Atlantic_Bight -0.15168181 0.05776527 192 -2.6258303 0.0457521357
Georges_Bank - Southern_New_England -0.04168067 0.05776527 192 -0.7215524 0.8884122646
Mid_Atlantic_Bight - Southern_New_England 0.11000114 0.05776527 192 1.9042779 0.2296619198
Code
#---- below might depend on slope assumption  ------

# ANOVA test for between significance
res.aov <- df %>% anova_test(med_len ~ year + region)
get_anova_table(res.aov)  %>% as.data.frame() %>% gt()
Effect DFn DFd F p p<.05 ges
year 1 195 3.924 4.90e-02 * 0.020
region 3 195 97.198 1.65e-38 * 0.599
Code
# Pairwise comparisons


# Get pairwise results for the factors
pwc <- df %>% 
  emmeans_test(
    med_len ~ region, 
    covariate = year,
    p.adjust.method = "bonferroni") %>% 
  add_xy_position(
    x = "region", fun = "mean_se")


# Display the adjusted means of each group
# Also called as the estimated marginal means (emmeans)
get_emmeans(pwc) %>% 
  ggplot(aes(emmean, fct_rev(region))) + 
  geom_pointrange(aes(xmin = conf.low, xmax = conf.high)) +
  theme_gmri() +
  labs(
    x = "Median Length",
    y = "Region",
    title = "GB + GOM significantly Longer Lengths than MAB & SNE\n
                Not significantly different within the pairs")

Code
# Visualization: line plots with p-values
get_emmeans(pwc) %>% 
  left_join(select(df, survey_area, region)) %>% 
  ggline(x = "survey_area", y = "emmean") +
  geom_errorbar(aes(ymin = conf.low, ymax = conf.high), width = 0.2) + 
  stat_pvalue_manual(pwc, hide.ns = TRUE, tip.length = FALSE) +
  labs(
    subtitle = get_test_label(res.aov, detailed = TRUE),
    caption = get_pwc_label(pwc)) +
  labs(y = "Median Length (cm)", x = "Region")

Code
# Need a slope/intercept  interaction between year and region
b_change_lm <- lm(b ~ year * region, data = df)

# Test homegeneity of slopes - for ANCOVA
df %>% anova_test(b ~ region * year)
Effect DFn DFd F p p<.05 ges
region 3 192 81.815 0.0e+00 * 0.561
year 1 192 6.132 1.4e-02 * 0.031
region:year 3 192 10.293 2.6e-06 * 0.139
Code
# They aren't homogenous - pairwise test for differences
slopes_lst <- emtrends(b_change_lm, ~ region, var = "year")
slopes_lst  %>% as.data.frame() %>% gt()         # slope estimates and CIs
region year.trend SE df lower.CL upper.CL
Gulf_of_Maine -0.0024661041 0.0005430654 192 -0.0035372444 -0.0013949638
Georges_Bank -0.0012330285 0.0005430654 192 -0.0023041688 -0.0001618882
Mid_Atlantic_Bight 0.0016887289 0.0005430654 192 0.0006175886 0.0027598692
Southern_New_England -0.0006792533 0.0005430654 192 -0.0017503936 0.0003918869
Code
pairs(slopes_lst)  %>% as.data.frame() %>% gt()   # comparisons
contrast estimate SE df t.ratio p.value
Gulf_of_Maine - Georges_Bank -0.0012330756 0.0007680105 192 -1.6055453 3.779217e-01
Gulf_of_Maine - Mid_Atlantic_Bight -0.0041548329 0.0007680105 192 -5.4098651 1.111432e-06
Gulf_of_Maine - Southern_New_England -0.0017868507 0.0007680105 192 -2.3265969 9.559426e-02
Georges_Bank - Mid_Atlantic_Bight -0.0029217573 0.0007680105 192 -3.8043197 1.088374e-03
Georges_Bank - Southern_New_England -0.0005537751 0.0007680105 192 -0.7210516 8.886152e-01
Mid_Atlantic_Bight - Southern_New_England 0.0023679822 0.0007680105 192 3.0832682 1.247060e-02
Code
#---- below might depend on slope assumption  ------

# ANOVA test for between significance
res.aov <- df %>% anova_test(b ~ year + region)
get_anova_table(res.aov)   %>% as.data.frame() %>% gt()
Effect DFn DFd F p p<.05 ges
year 1 195 5.365 2.20e-02 * 0.027
region 3 195 71.582 2.94e-31 * 0.524
Code
# Pairwise comparisons


# Get pairwise results for the factors
pwc <- df %>% 
  emmeans_test(
    b ~ region, 
    covariate = year,
    p.adjust.method = "bonferroni") %>% 
  add_xy_position(
    x = "region", fun = "mean_se")


# Display the adjusted means of each group
# Also called as the estimated marginal means (emmeans)
get_emmeans(pwc) %>% 
  ggplot(aes(emmean, fct_rev(region))) + 
  geom_pointrange(aes(xmin = conf.low, xmax = conf.high)) +
  theme_gmri() +
  labs(
    x = "Exponent of Size Spectra (b)",
    y = "Region",
    title = "GB + GOM Spectra Slopes Shallower than MAB & SNE\n
                Not significantly different within the pairs")

Code
# Visualization: line plots with p-values
get_emmeans(pwc) %>% 
  left_join(select(df, survey_area, region)) %>% 
  ggline(x = "survey_area", y = "emmean") +
  geom_errorbar(aes(ymin = conf.low, ymax = conf.high), width = 0.2) + 
  stat_pvalue_manual(pwc, hide.ns = TRUE, tip.length = FALSE) +
  labs(
    subtitle = get_test_label(res.aov, detailed = TRUE),
    caption = get_pwc_label(pwc)) +
  labs(y = "Exponent of Size Spectra (b)", x = "Region")